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Abstract 

This second paper of two companion papers on the estimation of power spectra 
specializes to the topic of estimating galaxy power spectra at large, linear 
scales using maximum likelihood methods. As in the first paper, the aims are 
pedagogical, and the emphasis is on concepts rather than technical detail. The 
paper covers most of the salient issues, including selection functions, likelihood 
functions, Karhunen-Loeve compression, pair-integral bias, Local Group flow, 
angular or radial systematics (arising for example from extinction), redshift 
distortions, quadratic compression, decorrelation, and disentanglement. The 
procedures are illustrated with results from the IRAS PSCz survey. Most of 
the PSCz graphics included in this paper have not been published elsewhere. 


1 Introduction 

This paper addresses the problem of estimating galaxy power spectra at linear 
scales using maximum likelihood methods. As discussed in Paper 1, the power 
spectrum is the most important statistic that can be measured from large scale 
structure (LSS), and, again as elaborated in Paper 1, maximum likelihood has 
a special status in statistics: if a best method exists, then it is the maximum 
likelihood method (Tegmark, Taylor & Heavens 1997 |45|L 

Fisher, Scharf & Lahav (1994) ^1] were the first to apply a likelihood 
approach to large scale structure. Heavens & Taylor (1995) m may be 
credited with accomplishing the first likelihood analysis designed to retain 
as much information as possible at linear scales. The primary goal of both 
Fisher et al na and Heavens & Taylor |22] was to measure the linear redshift 
distortion parameter [3 w from the IRAS 1.2 Jy survey. Shortly 

thereafter, Ballinger, Heavens & Taylor (1995) pQ extended the analysis to 
include a measurement of the real space (as opposed to redshift space) linear 
power spectrum of the 1.2 Jy survey in several bins of wavenumber. 
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Several authors have built on and applied the Heavens & Taylor’s (1995) 
[221 method. Tadros et al (1999) [27] and Taylor et al (2001) |2E1 applied 
improved versions of the method to measure the power spectrum and redshift 
distortions in the IRAS Point Source Catalogue redshift survey (PSCz; 
Saunders et al 2000 1221). More recently, Percival et al (2004) (221 applied 
the method to the final version of the 2 degree Field Galaxy Redshift Survey 
(2dFGRS; Colless et al 2003 (HI). 

The present paper aims at a pedagogical presentation of the various issues 
involved in carrying out a maximum likelihood analysis of the galaxy power 
spectrum and its redshift distortions. The paper is based largely on our own 
work over the last several years (Hamilton, Tegmark & Padmanabhan 2000 
El; Padmanabhan, Tegmark & Hamilton 2001 El; Tegmark, Hamilton & Xu 
2002 221; Tegmark et al 2004 23)- The procedures are illustrated here mainly 
with measurements from the IRAS PSCz survey (Saunders et al 2000 jHHjl. 
The measurements are based on the work reported by Hamilton, Tegmark 
& Padmanabhan (2000) |22, but most of the graphics in the present paper 
appear here for the first time. 

The paper starts, by showing the final real space linear power spectra 
measured by the methods described herein from the PSCz, 2dF, and SDSS 
surveys. The remainder of the paper is organized into sections each dealing 
with a specific aspect of measuring the linear power spectrum: selection 

function; H linear vs. nonlinear regimes; m Gaussian likelihood function; m 
numerical obstacle; [J3Karhunen-Loeve compression; ^removing pair-integral 
bias; Local Group flow; ^ilOl isolating angular and radial systematics; m 
redshift distortions and logarithmic spherical waves; m quadratic compres¬ 
sion; [^[decorrelation; and ;il4l disentanglement. Finally, 1 51 summarizes the 
onclusions. 


2 Results 

Figure^ taken from Tegmark et al (2004) |41|. shows the linear galaxy power 
spectra measured, by the methods described in this paper, from the PSCz 
(Saunders et al 2000 |2Sj), 2dF (Colless et al 2003 |2|), and SDSS (York et al 
2000 20]) surveys. 

Two important points to note about these power spectra are, first, that 
the power spectra have redshift distortions removed (fflU), and are therefore 
real space power spectra, and second, that the power spectra have been 
decorrelated (03, so that each point with its error bar represents a sta¬ 
tistically independent object. Each point represents not the power at a single 
wavenumber, but rather the power in a certain well-defined band (El. 
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Fig. 1. From Tegmark et al (2004) 1411 . Decorrelated, real space (not redshift 
space) linear power spectra measured from the PSCz survey (Hamilton, Tegmark 
& Padmanabhan 2000 1211 1. the 2dF 100k survey (Tegmark, Hamilton & Xu 2002 
1431 1. and the SDSS survey (Tegmark et al 2004 1411 1. 


3 Selection Function 

A prerequisite for measuring galaxy power spectra, whether at linear or 
nonlinear scales, is to measure the angular and radial selection functions of a 
survey. All I really want to say here is that selection functions do not grow on 
trees, but require a lot of sometimes unappreciated hard work to measure. 

As regards the angular selection function, one thing that can help cut down 
the work and improve accuracy is a software package MANGLE that we recently 
published (Hamilton & Tegmark 2004 HD- Max tells me that mangle has 
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become the de facto software with which the SDSS team are characterizing 
the angular selection function of the SDSS. 

Measuring the radial selection function is similarly onerous. The principal 
goal of all the methods is to separate out the smooth radial variation of the 
selection function from the large variations in galaxy density caused by galaxy 
clustering. Invariably, the essential assumption made to effect this separation 
is that the galaxy luminosity function is a universal function, independent 
of position. One of the best papers (in terms of content, as opposed to 
comprehensibility) on the subject remains the seminal paper by Lynden-Bell 
(1971) 1201, who applied a maximum likelihood method. Other papers include: 
Sandage, Tammann & Yahil (1979) [22, Choloniewski (1986, 1987) [01 Cj, 
Binggeli, Sandage & Tammann (1988) 0], Efstathiou, Ellis & Peterson (1988) 
im, SubbaRao et al (1996) |201) Heyl et al (1997) (221) Willmer (1997) [^ . 
Tresse (1999) |T7| . 


4 Linear vs. Nonlinear Regimes 

There is a big difference between the linear and nonlinear regimes of galaxy 
clustering, and it makes sense to measure the power spectrum using different 
techniques in the two regimes. 

At linear scales, it is legitimate to assume a much tighter prior than at 
nonlinear scales. On the other hand, at linear scales there are far fewer modes 
than at nonlinear scales. 

At linear scales, you can reasonably assume (with various degrees of 
confidence): 

• Gaussian fluctuations; 

• redshift distortions conform to the linear Kaiser (1987) m model; 

• linear bias between galaxies and matter. 

At nonlinear scales, all of the above assumptions are false, and it would be 
an error to assume that they are true. There is however one useful assumption 
that is a better approximation at nonlinear scales than at linear scales: 

• redshift distortions are plane-parallel (the distant observer approxima¬ 
tion). 

The remainder of this paper devotes itself to the problem of measuring 
power at linear scales. 


5 Gaussian Likelihood Function 

Measuring the power spectrum of a galaxy survey at linear scales starts with 
the fundamental prior assumption that the density field is Gaussian. With 
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this assumption, one has the luxury of being able to write down an explicit 
Gaussian likelihood function 



( 1 ) 


where Si is a vector of measured overdensities, Cij is their covariance matrix 
(part of the prior) and \C\ is the determinant of the covariance matrix. 
Normally, the covariance matrix is assumed to be a sum of two parts, a cosmic 
part and a shot noise part - see below. 

It should be emphasized that the assumption of Gaussian fluctuations 
is not valid at nonlinear scales, and it would be wrong to assume that 
the likelihood CJ holds at nonlinear scales. If you use the likelihood o at 
nonlinear scales, then you will substantially underestimate the true error bars 
on the power spectrum. 

6 Numerical Obstacle 

Perhaps the biggest single obstacle to overcome in carrying out a Gaussian 
likelhood analysis is a numerical one, the limit on the size of the covariance 
matrix C that the numerics can deal with. Suppose that you have N modes 
in the likelihood function. Manipulating the N x N covariance matrix C is 
typically an process. Thus doubling the number of modes takes 2^ = 8 
times as much computer time. 

To make matters worse, consider the fact that the number of modes in a 
survey increases as roughly the cube of the maximum wavenumber (smallest 
scale) to which you choose to probe, N ~ fcmax- The numerical problem thus 
scales as /c)(jax- other words, if you want to push to half the scale, by 
doubling femax, then you need 2^ = 8 times as many modes, and 2® = 512 
times as much computer time. 

Fortunately, the linear regime is covered with a tractable number of modes. 
The boundary between linear and nonlinear regimes is at fc ss 0.3/iMpc~^. 
If you prefer to remain safely in the linear regime, you may prefer to stick 
to fc < O.l/iMpc”^, as did Heavens & Taylor (1995) [22]. Max Tegmark 
and I typically have used about 4000 modes (a week of computer time on a 
workstation), which took us to fc 0.25/iMpc“^ in the PSGz survey, and 
k « 0.15 /iMpc“^ in the 2dF 100k and SDSS surveys. 

The previous paragraph would seem to suggest that 4000 modes is fine, but 
I have to confess that although our coverage of information is close to 100% at 
the largest scales, we lose progressively more information at smaller scales. To 
catch all the information available at say k ^ O.l/iMpc”^, we should really 
push to 10^ or 10® modes. 

How does one deal with the numerical limitation of being able to use only 
a finite number of modes? One thing is to make sure that your code is as fast 
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as can be. Several years ago Max Tegmark taught me some tricks that can 
speed up matrix manipulations by 1 or 2 orders of magnitude (without which 
our code would take a year to run instead of a week). It helps to time the 
various steps in the code, and to oil the bottlenecks. 

The other big thing to get around the numerical limitations is to use the 
Karhunen-Loeve technique to compress the information of interest - power at 
linear scales - into a modest number of modes. 


7 Karhunen-Loeve Compression 

The idea of Karhunen-Loeve (KL) compression is to keep only the highest 
signal-to-noise modes in the likelihood function. This idea was first proposed 
by Vogeley & Szalay (1996) @S| for application to LSS. 

Suppose that the covariance matrix C is a sum of a signal S (the cosmic 
variance) and noise N (the shot noise) 

C = S + N. (2) 

Prewhiten the covariance matrix, that is, transform the covariance matrix so 
that the noise matrix is the unit matrix: 

+ 1 (3) 

where the 1 on the RHS is to be interpreted as the unit matrix. Diagonalize 
the prewhitened signal: 

Ar-i/ 25 .^- 1/2 ^ (4) 

where O is an orthogonal matrix and A is diagonal. Since the unit matrix 
remains the unit matrix under any diagonalization, the covariance matrix is 

= 0(A-k l)OT , (5) 

The resulting eigenmodes, the columns of the orthogonal matrix O, are 
Karhunen-Loeve, or signal-to-noise, eigenmodes, with eigenvalues A equal 
to the signal-to-noise ratio of each mode. 

The KL procedure thus parcels the information in a signal into a discrete 
set of modes ordered by their signal-to-noise. This is definitely a very neat 
trick. 

However, there is a big drawback to KL compression, which is that you 
typically want to extract N modes from a much larger pool of TV ^ modes 
- that’s why it’s called compression. And that requires diagonalizing a large 
TV X TV matrix. But the whole point of KL compression is to avoid having to 
mess with a large matrix. 

Fortunately, there is a way out of this loop. The first point to note is that 
the thing of interest is the ensemble of KL modes, not each mode individually 
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Fig. 2. Selection of angular Karhunen-Loeve modes in the PSCz survey with the 
high-latitude mask. The projection is Hammer-Aitoff in galactic coordinates, with 
the Galactic Centre at centre. The top 4 modes are special, while the rest are 
constructed from the KL procedure. The angular modes are, from top left to bottom 
right: mode 1, the (cut) monopole mode; modes 2-4, the three (cut) dipole modes 
(with small admixtures of cut monopole to make them orthogonal to the monopole); 
then modes 5, 10, 20, 40, 80, and 160. All modes are mutually orthogonal over the 
unmasked part of the sky. The modes are finite sums of harmonics up to Z = 39. 
Mode 5 (left middle) is the (non-special) KL mode containing the most information 
about large scale angular power. The mode evidently “knows about” the PSCz 
angular mask: the mode has low amplitude in masked regions of the survey, and 
high amplitude in unmasked regions. 
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Fig. 3. The first 5 radial Karhunen-Loeve modes associated with the (cut) monopole 
angular mode in the PSCz survey. The first two radial modes are special: mode 1 
(thick blue) is the mean mode, and mode 2 (medium green) is the Local Group (LG) 
motion mode (with an admixture of the mean mode to make it orthogonal to the 
mean). The remaining radial modes are constructed from the KL procedure. The 
modes are all mutually orthogonal over the (unshaded) interval from 1 h“^Mpc to 
420 h~^Mpc. The modes are finite sums of logarithmic radial waves (313 defined, to 
avoid aliasing, over the extended logarthmic interval 10“^ /i“^Mpc to 10“* /i“^Mpc. 
Radial KL modes associated with other angular KL modes are similar but not 
identical. For non-special angular modes (i.e. angular modes other than the cut 
monopole and dipole) it is optional whether or not to force the first mode(s) to be 
special. Nowadays we tend to keep the mean but not the LG radial mode in non¬ 
special angular modes. Keeping the mean radial mode make it possible to test for 
possible purely angular systematics, such as might be associated with extinction. 


(though they are cute to look at - see Figures [21 & El- Thus the KL modes do 
not have to be perfect. The second point is that what you call “signal” is up 
to you. In the case under consideration the signal is “the power spectrum at 
linear scales”, which is actually not a single signal but a whole suite of signals. 

Our strategy to solve the KL crunch is to compress first into a set of 
angular KL modes (Figure (21); and then within each angular KL mode to 
compress into a set of radial KL modes (Figure EJ. The result is a set of 
“pseudo-Karhunen-Loeve” (PKL) modes each of which is the product of an 
angular and a radial profile. The PKL modes are not perfect, but they cover 
the relevant subspace of Hilbert space without gaps, and that’s all we need. 

Since the goal is to measure power at linear scales, we choose the “signal” 
to be not a realistic power spectrum, but rather an artificial power spectrum 
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KL mode number 

Fig. 4. Amplitudes of 4095 of the 4096 pseudo-Karhunen-Loeve modes in the PSCz 
survey (the missing mode is the mean mode, whose amplitude on this scale is huge). 
The dots are the measured amplitudes, which according to the prior are expected to 
be Gaussianly distributed about zero, with expected standard deviation as given by 
the solid line. The dashed line is the expected standard deviation from shot noise 
alone. 

P{k) oc ® which increases steeply to large scales. Thus our procedure 
favours modes that are sensitive to power at large scales; but a low-noise 
small scale mode can beat out a noisy large scale mode. 

Typically, we start with a few thousand angular modes in spherical 
harmonic space, and apply KL diagonalization to these angular modes. We 
then march through each angular KL mode one by one. Within each angular 
KL mode, we resolve the radial direction into several hundred logarithmic 
spherical waves, and apply KL diagonalization to those. We keep a running 
pool of the best 4000 modes so far. There is no need to go through all the 
angular KL modes. The later angular KL modes contain little information, 
and when we have gone through 10 successive angular KL modes and found 
no new mode good enough to make it into the pool of best modes, we stop. 
The procedure effectively compresses 10®-10® modes into 4000, but remains 
well within the capabilities of a modern workstation. 

Figure^lshows the amplitudes Xi of 4095 of 4096 PKL modes (the omitted 
mode is the mean, equation 0 , whose amplitude, equation is huge 
compared to all other modes) measured in the PSCz survey. According to the 
prior, the amplitudes should be Gaussianly distributed about zero (excepting 
the mean mode), with variances given by the diagonal elements Ca = (Acc^) of 
the (prior) covariance matrix of PKL modes. Indeed, the measured amplitudes 
are consistent with this prior. 
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8 Removing Pair-Integral Bias 


You are undoubtedly familiar with the notion that if you measure both the 
mean and the variance from a data set, then the measurement of the variance 
will be biassed low. The usual fix up is, if you have N independent data, to 
divide the sum of the squared deviations by iV — 1 rather than N. 

Applied to LSS, this bias is known in the literature as the “pair integral 
constraint” (the observed number of neighbours of a galaxy in a survey equals 
the number of galaxies in the survey minus one). 

The simplistic procedure of dividing by — 1 instead of N does not work 
in LSS, but Fisher et al (1993) ^31 pointed out a delightfully simple trick 
that does completely solve the problem. The Fisher et al trick is to isolate the 
mean, the selection function n(r), into a single “mean mode”, and to make 
all other modes orthogonal to the mean. 

In the present context, let V^i(r) denote a PKL mode. The observed 
amplitude Xi of a PKL mode is defined to be 


Xi = 


A{r) 


n{r) 

n{r) 


dV = 


E 

galaxies 


^i{rg) 

g n{rg) 


( 6 ) 


The mean mode ipiir) is defined to be the mode whose shape is the mean 


'tpi{r) = n{r) . 


(7) 


The amplitude of the mean mode is 


Xl 


i(r) d^r = Yj 


gal 


( 8 ) 


the number of galaxies in the survey. Fisher et al’s (1993) m trick is to 
arrange all modes other than the mean to be orthogonal to the mean 

{xi) = J tlJi{r)d^r = 0 . (9) 

This trick ensures that the amplitudes of all modes except the mean mode 
are unaffected (to linear order, anyway) by uncertainty in the mean. The 
resulting power spectrum is unbiassed by the pair integral constraint. Neat! 

The observed amplitude of the mean mode is used in computing the 
maximum likelihood normalization of the selection function, but is then 
discarded from the analysis, because it is impossible to measure the fluctuation 
of the mean mode, just as it is impossible to measure the fluctuation of the 
monopole mode in the CMB. 


9 Local Group Flow 

The motion of the Local Group (LG) induces a dipole in the density distri¬ 
bution around it (Hamilton 1998 eq. (4.42)) (or rather, since the LG is 
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going with the flow, the motion of the LG removes the dipole present in the 
CMB frame). Although the LG mode is a single mode, we choose to project 
the effect, along with the mean mode, into a set of 8 modes, whose angular 
parts are (cut) monopole and dipole (1 + 3 = 4 modes), illustrated in the 
top four panels of Figure [3 and whose radial parts are the mean mode n(r) 
and the LG radial mode d [r^h(r)] jr^dr (1 + 1 = 2 modes), illustrated in 
Figure 01 

Since the motion of the LG through the CMB is known (Bennett et al 
2003 01; Courteau & van den Bergh 1999 0); Lineweaver et al 1996 OHI)) the 
amplitudes of the LG modes can be corrected for this motion, and included 
in the analysis. This is unlike the CMB, where the fluctuations of the dipole 
modes cannot be measured separately from the motion of the Sun, and must 
be discarded. 


10 Isolating Angular and Radial Systematics 

A similar trick can be used to isolate other potential problems into specific 
modes or sets of modes. 

For example, possible angular systematics, associated for example with 
uncertainties in angular extinction across the sky, can be projected into a set 
of purely angular modes (whose radial part is the mean radial mode). Other 
modes should be unaffected (to linear order) by such systematics, because they 
are orthogonal to purely angular variations. If a systematic effect arising from 
extinction were present, then it would show up as a systematic enhancement 
of power in the purely angular modes. 

Similarly, possible radial systematics, associated perhaps with uncertain¬ 
ties in the radial selection function, or with evolution as a function of redshift, 
can be projected into a set of purely radial modes (whose angular part is the 
cut monopole). 

We did not project out purely angular or radial modes in our PSGz 
analysis, but we have been doing this in our more recent work (Tegmark, 
Hamilton & Xu 2002 ^ 21 ; Tegmark et al 2004 |41p. 


11 Redshift Distortions and Logarithmic Spherical 
Waves 

Large scale coherent motions towards overdense regions induce a linear 
squashing effect on the correlation function of galaxies observed in redshift 
space, as illustrated in FigureElfor the PSGz survey. At smaller scales, collapse 
and virialization gives rise to so-called fingers-of-god, visible in FigureElas 
a mild extension of the correlation function along the line-of-sight axis. 

Kaiser (1987) [21] first pointed out the celebrated result that, at linear 
scales, and in the plane-parallel (distant observer) approximation, the Fourier 



Fig. 5. Contour plot of the redshift space two-point correlation function in the PSCz 
survey with the high galactic latitude angular mask. Unlike the analysis discussed in 
the rest of this paper, this plot assumes that redshift distortions are plane-parallel 
(to which end only galaxies beyond 25/i“^Mpc are included). The expected linear 
squashing effect is plainly visible, while nonlinear fingers-of-god show up as a mild 
extension along the line-of-sight axis. Thin, medium, and thick contours represent 
negative, positive, and zero values respectively. The correlation function has been 
smoothed over pair separation r = (r^ + with a tophat window of width 

0.2 dex, and over angles 0 = tan“'^(rj_/r||) to the line of sight with a Gaussian 
window with a Icr width of 10°. From Hamilton, Tegmark & Padmanabhan (2000) 

HU 

amplitude of galaxies in redshift space is amplified over the Fourier 

amplitude 6{k) of mass in real space by a factor b + //r| 

6^^Hk) = {b + f^,l)Sik) ( 10 ) 

where fik = z.k is the cosine of the angle between the wavevector k and the 

4/7 

line of sight z, the quantity b is the linear galaxy-to-mass bias, and / « flm 
is the dimensionless linear growth rate of fluctuations. It follows immediately 
from Kaiser’s formula that, again at linear scales, and in the plane-parallel 
approximation, the redshift space galaxy power spectrum P^{k) is amplified 
over the real space matter power spectrum P{k) by 
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CO 



Fig. 6. The dimensionless log frequency ui is the radial analogue of the dimensionless 
spherical harmonic number 1. A logarithmic spherical wave Zuiim with log frequency 
u! and harmonic number I has wavevector angled effectively at 9 = tan“^(Z/a;) to 
the line-of-sight. 


P^^\k) = {b + ftilfP{k). ( 11 ) 

Translated from Fourier space into real space, Kaiser’s formula predicts the 
large scale squashing effect visible in Figure 0 

For the linear likelihood analysis being considered here, the assumption of 
linear redshift distortions is fine, but the plane-parallel approximation is not 
adequate. Kaiser (1987) already in his original paper, presented formulae 
for radial redshift distortions. A pedagogical derivation can be found in the 
review by Hamilton (1998) |lti|. Unfortunately, when the radial character of 
the redshift distortions is taken into account, the formula for the amplification 
of modes ceases to be anything like as simple as Kaiser’s formula m- Indeed, 
the full, correct formula is more complicated in Fourier space than in real 
space. 

In our pipeline, we use logarithmic spherical waves as the fundamental 
basis with respect to which we express PKL modes, in part because radial 
redshift distortions take a simple form in that basis, as first pointed out by 
Hamilton & Culhane (1996) CHI. Logarithmic spherical waves are products of 
logarithmic radial waves and spherical harmonics Yim{r) 

^c.iru(r) = e‘“'""yz„(f) (12) 

and are eigenmodes of the complete set of commuting Hermitian operators 



If you wonder why no one ever told you about logarithmic spherical waves 
in quantum mechanics, so do I! They are beautiful things. For example, the 
logarithmic radial frequency uj is the radial analogue of the angular harmonic 
number I (see Figure 


UJ ^ radial as I <-> angular 


(14) 


something that everyone ought to know. 
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In logarithmic spherical wave space, the overdensity of galaxies in 
redshift space is related to the overdensity 5^im of mass in real space by 


= 


b + f 


(iw + l/2)(ia; — 1/2) — a{r){iu! — 1/2) 
(iw + Z- l/2)(iw-Z-3/2) 


olm • 


(15) 


Equation m may look complicated, but the amplification factor in square 
brackets on the RHS is just a number, as in the pretty Kaiser formula m- 
OK, so I lied; if you look closely, you will see that equation CHI contains, 
in the expression in the square brackets, a function a{r) of radial depth r, 
defined to be the logarithmic derivative of the radial selection function 


a{r) 


d In [r^n(r)] 
dhur 


(16) 


So the expression OSl) is not as pretty as Kaiser’s. But in practice, it turns out 
that the a{r) factor gets absorbed into another factor of fi{r) at a previous 
step of the pipeline, and so proves not to pose any special difficulty. [If you are 
wondering whether equation (ESI has a rigorous mathematical meaning, the 
answer is yes it does: in real space, a{r) is a diagonal matrix with eigenvalues 
a(r); the symbol a(r) in equation m is the same matrix, but expressed in 
ujlm space.] 


12 Quadratic Compression 

Quadratic compression (Tegmark 1997 m-, Bond, Jaffe & Knox 2000 0) is 
a beautiful idea in which the information in a set of modes is losslessly (or 
almost losslessly) compressed not all the way to cosmological parameters, but 
rather to a set of power spectra. For galaxy power spectra the result is not one 
power spectrum but (at least) three: the galaxy-galaxy, galaxy-velocity, and 
velocity-velocity power spectra. I say “at least” because I expect that in the 
future the drive to reduce systematics from luminosity bias (more luminous 
galaxies are more clustered than faint galaxies) will warrant resolution of 
power spectra into “luminous” and “faint” components. 

The point of reducing to power spectra rather than cosmological param¬ 
eters is that the covariance matrix C in the Gaussian likelihood function 
depends, by definition, linearly on the prior cosmic power spectrum pa'. 

C = J2c,c.Pc. + N . (17) 

oc 

Here Pa denotes a set of cosmic galaxy-galaxy, galaxy-velocity, and velocity- 
velocity powers at various wavenumbers, is shorthand for the derivative 
dC/dpa, and N is the shot noise. For example, we have typically estimated 
the power in 49 bins of logarithmically-spaced wavenumbers, so there are 
49 X 3 = 147 power spectrum parameters Pa to estimate. Normally it would 
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be intractable to find the maximum likelihood solution for 147 cosmological 
parameters, but because the covariance C depends linearly on the powers Pa 
the solution is analytic. 

Although the solution is analytic, it involves 147 matrices Cq each 
4000 X 4000 in size (for 4000 modes), and it still takes a bit of cunning 
to accomplish that solution numerically. A good trick is to decompose the 
(symmetric) covariance matrix C as the product of a lower triangular matrix 
L and its transpose 

C = LL^ . (18) 

The jargon name for this is Cholesky decomposition, and there are fast ways 
to do it. The Fisher matrix of the parameters is 

Fcp = I . (19) 

Then, from the measured amplitudes 6 of the PKL modes, form the shot- 
noise-subtracted quadratic estimator 

i {L-^Sy (L-iC„L-^) (L-M) - (20) 

in which iV„ denotes the shot noise, the self-pair contribution to the main 
term on the right hand side. The expected mean and variance of the quadratic 
estimators Qa are 

{qa)=Fa0Pf3 (21) 

(AqaAq/s) = Fap . ( 22 ) 

Suitably scaled, the quadratic estimates qa can be regarded as smoothed-but- 
correlated estimates of the parameters Pa ■ 

Given equation eu, an estimator of power which we call the “raw” 
estimator, can be defined by 

Pa = F^^ qp (23) 

which is an unbiassed estimator because {pa) = Pa- The raw estimates are 
anti-correlated, with covariance 

{Ap^App) = F-^ . (24) 

This raw estimator exhausts the Cramer-Rao inequality (Paper 1, eq. (31)), 
and therefore no better estimator of power exists. 

The raw estimates Pa, along with their full covariance matrix, contain all 
the information available from the observational data, and can be used in a 
maximum likelihood analysis of cosmological parameters. However, if the raw 
estimates Pa are plotted on a graph, with error bars given by the square root 
of the diagonal element of the covariance matrix, (A“„)^/^, then the result 
gives a misleadingly pessimistic impression of the true uncertainties. This is 
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parameter pj 



Fig. 7. A vector p = {pi,p 2 ) of correlated parameter estimators. The thick 
(blue) dashed lines represent the Principal Component Decomposition of Pa, the 
eigenvectors of their covariance matrix. They are uncorrelated. The thick (green) 
solid lines represent the parameter estimates decorrelated with the square 

root of the Fisher matrix. They are also uncorrelated. The diagram on the right is 
the same as that on the left, but stretched out along the minor axis of the error 
ellipse, so that the error ellipse becomes an error circle. Any two vectors that are 
orthogonal on the error circle are uncorrelated. 

because in plotting errors only from the diagonal elements of the covariance 
matrix, one is effectively discarding useful information in the cross-correlation 
between bins. Thus the raw estimates of power, plotted on a graph, appear 
unnecessarily pessimistic and noisy. 

Whereas the quadratic estimates qa are correlated, and the raw es¬ 
timates Pa is anti-correlated, there are compromise estimators that are, 
like Goldilocks’ porridge, just right. These are the decorrelated estimators 
discussed in the next section. 

It was stated above that the raw estimates Pa and their full covariance 
matrix contain all the information available from the observational data. 
Actually this is not quite true, if one uses a Gaussian approximation to the 
likelihood function as a function of the parameters Pa-, as opposed to the full 
likelihood function. A principal idea behind radical compression (Bond, 
Jaffe & Knox 2000 |3) is to take functions of the parameters arranged so 
as to make the likelihood function as Gaussian in the remapped parameters 
as possible, and hence to extract the last (well, almost the last) ounce of 
information from the data. 
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13 Decorrelation 

Decorrelation, introduced by Hamilton (1997) and first applied (to the 
COBE power spectrum) by Tegmark & Hamilton (1998) is another 
delightful concept, yielding estimates of power at each wavenumber that are 
uncorrelated with all others. A detailed exposition is given by Hamilton & 
Tegmark (2000) We assume throughout this section that the likelihood 
function is well-approximated as a Gaussian in the parameters Pa, so that the 
Fisher matrix equals the inverse covariance matrix. 

The idea of decorrelation applies quite generally to any set of correlated 
estimates of parameters, not just to power spectra. The left panel of Figure[71 
illustrates an example of two correlated parameter estimates pi and p 2 ■ The 
fact that the error ellipse is tilted from horizontal indicates that the parameter 
estimates are correlated. 

There are infinitely many linear combinations of the parameter estimates 
Pi and p 2 of Figure [3 that are uncorrelated. Most obviously, the eigenvectors 
of the covariance matrix - the major and minor axes of the error ellipse 
- are uncorrelated. The decomposition of a set of parameter estimates into 
eigenvectors is called Principal Component Decomposition. 

However, there are infinitely many other ways to form uncorrelated linear 
combinations of correlated parameters. The right panel of Figure 0 is the 
same as the left panel, but stretched out along the minor axis so that the 
error ellipse becomes an error circle. Any two orthogonal vectors on this error 
circle are uncorrelated. For example, one possibility, shown as thick solid lines 
in Figured is to choose the two vectors on the error circle to be parallel to the 
original parameter axes. This choice of uncorrelated parameters has the merit 
that it is in a sense “closest” to the original parameters. When the error circle 
is squashed back to the orginal error ellipse in the left panel of Figured the 
decorrelated parameters (the thick solid lines) are no longer perpendicular, 
but they are nonetheless uncorrelated. 

Exercise 1. Show that these decorrelated parameter estimates (the ones 
corresponding to the thick solid lines in Figured given by □ 

Mathematically, decorrelating a set of correlated parameter estimates is 
equivalent to decomposing their Fisher matrix as 

F = M^AM (25) 

where A is diagonal. The matrix M, which need not be orthogonal, is called 
a decorrelation matrix. The parameter combinations Mp are uncorrelated 
beause their covariance matrix is diagonal: 

{A{Mp)A{Mpf) = M i^Ap/Ap^'^ M'^ = = A-^ . (26) 

Thus each row of the decorrelation matrix M represents a parameter combi¬ 
nation that is uncorrelated with all other rows. It is possible to rescale each 
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row of M so the diagonal matrix A is the unit matrix, so that the parameter 
combinations Mp have unit covariance matrix. Usually however one prefers a 
more physically motivated scaling. 

In the case of the power spectrum, the rows of the decorrelation matrix 
M represent band-power windows, and it is sensible to normalize them to 
unit area (sum of each row is one), so that a measured power Mp can be 
interpreted as the power averaged over the band-power window. 

Figured illustrates the case where the decorrelation matrix is taken to be 
the square root of the Fisher matrix 

M = . (27) 

Applied to the power spectrum, this choice (or rather, a version thereof 
scaled with the prior power - see Hamilton & Tegmark 2000 cni for details) 
provides nicely behaved band-power windows that are (at least at linear 
scales) everywhere positive, and concentrated narrowly about each target 
wavenumber, as illustrated in Figured 

By contrast, principal component decomposition yields broad, wiggly, non¬ 
positive band-power windows that mix power at small and large scales in a 
physically empty fashion. 


14 Disentanglement 

As described in quadratic compression yields estimates of not one 

but three power spectra, the galaxy-galaxy (gg), galaxy-velocity (gv), and 
velocity-velocity (vv) power spectra. The three power spectra are related to 
the true underlying matter power spectrum P{k) by (Kolatt & Dekel 1997 
|77|: Tegmark 1998 HOI; Pen 1998 |^; Tegmark & Peebles 1998 @31 i Dekel 
& Lahav 1999 [111)1: 

galaxy-galaxy power : Pgg{k) = b{k)^P{k) 
galaxy-velocity power : -Pgv(fc) = r{k)b{k)fP{k) (28) 

velocity-velocity power : Pvv{k) = pP{k) 

where b{k) is the (possibly scale-dependent) galaxy-to-mass bias factor, r{k) £ 
[—1,1] is a (possibly scale-dependent) galaxy-velocity correlation coefficient, 
and / is the dimensionless linear growth rate, which is well-approximated by 
(Lahav et al 1991 (23; Hamilton 2001 |17|1 

/ « + (1 + U^/2) Ua/70 . (29) 

More correctly, the ‘velocity’ here refers to minus the velocity divergence, 
which in linear theory is related to the mass (not galaxy) overdensity 5 by 


fSpV ■v = Q, 


(30) 
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Fig. 8. Band-power windows for the disentangled, decorrelated power spectra 
measured from the PSCz survey for (left) the galaxy-galaxy power spectrum Pgg, 
(middle) the galaxy-velocity power spectrum -Fgvi and (right) the velocity-velocity 
power spectrum Pvv Each band-power includes contributions from all three power 
spectra, the (thick black) gg, (medium blue) gv, and (thin red) gg powers, but the 
off-type contributions cancel, according to the prior. Eor example, in the left panel, 
the contributions to the gg band-power from each of the gv and vv powers should 
sum to zero, if the prior is correct. 


where V denotes the comoving gradient in velocity units. 

The linear velocity-velocity power spectrum Pvv{k) is of particular interest 
because, to the extent that galaxy flows trace dark matter flows on large scales 
(which should be true), it offers an unbiassed measure of the shape of the 
matter power spectrum P{k). Indeed if the dimensionless linear growth rate 
/ is taken as a known quantity, then Pvv{k) provides a direct measurement 
of the shape and amplitude of the matter power spectrum P{k). It should be 
cautioned that Pvv(fc) is a direct measure of matter power only at linear scales, 
where redshift distortions conform to the linear Kaiser (1987) m model. At 
nonlinear scales, fingers-of-god are expected to enhance the velocity-velocity 
power above the predicted linear value. 

Although each quadratic estimate Qq,, equation is targeted to measure 
a single power type and a single wavenumber (a single parameter pa), 
inevitably the nature of real surveys causes each estimate qa to contain a 
mixture of all three power spectra at many wavenumbers, in accordance with 
equation 
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What one would like to do is to disentangle the three power spectra, 
projecting out an unmixed version of each. The problem is somewhat similar 
to the CMB problem of forming disentangled TT, TE, EE, and BB power 
spectra from the amplitudes of fluctuations in temperature (T), and in electric 
{E) and magnetic {B) polarization modes {TB and EB cross power spectra 
are expected to vanish because B modes have opposite parity to T and E 
modes). 

Now the raw estimates of power equation are already disen¬ 

tangled, and as remarked in m they and their covariance matrix contain 
(almost) all the information in the observational data. Thus if the aim is 
merely to do a cosmological parameter analysis, then it is fine to stop at the 
raw powers Pa ■ 

However, before leaping to cosmological parameters, it is wise to plot the 
gg, gv, and vv power spectra explicitly. 

One possibility would be to plot each power spectrum marginalized over 
the other two. The drawback with this is that marginalization effectively 
means discarding information contained in the correlations between the power 
spectra, and discarding this information leads to unnecessarily noisy estimates 
of power. 

Instead, we have adopted the following procedure. First, we decorrelate 
the entire set of estimates of power (three power spectra at each of many 
wavenumbers) with the square root of the Fisher matrix (prescaled with the 
prior power - see Hamilton & Tegmark 2000 cni). The result is three power 
spectra every point of which is uncorrelated with every other (with respect to 
both type and wavenumber). The three power spectra are uncorrelated, but 
not disentangled: each uncorrelated power spectrum contains contributions 
from all three types, gg, gv, and vv. To disentangle them, we multiply the three 
uncorrelated power spectra at each wavenumber with a matrix that, if the 
prior is correct, yields pure gg, gv, and vv power spectra. The resulting band- 
power spectra and their error bars represent the information in the survey 
about as fairly as can be. The three power spectra at each wavenumber are 
correlated, but uncorrelated with the power spectra at all other wavenumbers. 

Figure |H1 illustrates the resulting band-power windows for each of the 
gg, gv, and vv power spectra in the PSCz survey. The gg band-power 
window, for example, contains contributions from gv, and vv powers, but 
those contributions should cancel out if the prior power is correct. Note 
that cancellation invokes the prior only weakly: the off-type contributions 
will cancel provided only that the true power has the same shape (not the 
same normalization) as the prior power over a narrow range (because the 
band-power windows are narrow). 

Figure IHI shows the galaxy-galaxy, galaxy-velocity, and velocity-velocity 
power spectra Pgg, Pgv, and Pw measured in this fashion from the PSCz 
survey (the galaxy-galaxy power spectrum Pgg shown here appears in Hamil¬ 
ton, Tegmark & Padmanabhan 2000 EH, but Pgv and Pw have not appeared 
elsewhere). As can be seen, the galaxy-velocity and velocity-velocity power 
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Fig. 9. Points with error bars show the decorrelated power spectra measured from 
the PSCz survey: (thick black) the galaxy-galaxy power spectrum ^gg) (medium blue 
dotted) the galaxy-velocity power spectrum Pgv, and (thin red dashed) the velocity- 
velocity power spectrum Pgv. The gv and vv powers are plotted with coarser binning 
because they are noisier than the gg power. Each point is plotted at the median 
wavenumber of its band-power window; the horizontal error bar gives the EWHM 
of the band-power window. The smooth solid (green) line running through the 
galaxy-galaxy power spectrum is the linear concordance ACDM model of Tegmark, 
Zaldarriaga & Hamilton (2001) 1461 . 


spectra are noisy, but well detected. Nonlinear fingers-of-god cause the galaxy- 
velocity power Pgv to go negative at fc > 0.2/iMpc”^, and at the same time 
enhance the velocity-velocity power Pw somewhat. 
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15 Conclusion 

In its simplest form, measuring the galaxy power spectrum is almost trivial: 
bung the survey in a box, Fourier transform it, and measure the resulting 
power spectrum (Baumgart & Fry 1991) |2]- A variant of this method, the 
Feldman, Kaiser & Peacock (1994) m method, in which the galaxy density is 
weighted by a weighting which is near-minimum-variance subject to some not- 
necessarily-true assumptions before Fourier transforming it, offers a fast and 
not-so-bad way to measure power spectra, good for when maximum likelihood 
would be overkill. 

However, the “right” way to measure power spectra, at least at large, linear 
scales, is to use Bayesian maximum likelihood analysis. Maximum likelihood 
methods for measuring galaxy power spectra were pioneered by Fisher, Scharf 
& Lahav (1994) and Heavens & Taylor (1995) and have seen 

considerable refinement in recent years. Maximum likelihood methods require 
a lot more work than traditional methods. The pay-off is not necessarily 
more precision (the error bars are all too often larger than claimed error 
bars from more primitive methods), but more precision in the precision. 
That is, you can have some confidence that the derived uncertainties reliably 
reflect the information in the data, no more and no less. This is essential 
when measurements are to be used in estimating cosmological parameters. 
Moreover maximum likelihood methods provide a general way to deal with 
all the complications of real galaxy surveys, as spherical redshift distortions, 
Local Group flow, variable extinction, and such-like. 
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